• conditional_soup@lemm.ee
    link
    fedilink
    arrow-up
    161
    arrow-down
    1
    ·
    2 days ago

    The client wants to drag and drop their own personalized excel file with no guaranteed formatting or column order or data contract in order to import their data into our system <3

        • 7dev7random7@suppo.fi
          link
          fedilink
          arrow-up
          5
          ·
          1 day ago

          May I?

          A controlling department wasn’t granted any money for digitializing their workflow.

          So these guys created their own solution(s!). Things like dedicated “user interfaces” loading data from tables created by hand. After years these people realized that data formatting is quite the issue.

          They started to put random rules into different tables:

          Two empty lines: New Group Data Record. One empty line: New Subgroup Data Record.

          Excel tables aggregating this data via hardcoded links.

          A dedicated table to start calculations on parent tables.

          They mutated data like this:

          Load data from excel files into one. Manually delete, add or change lines (or columns). Start a collection run from dedicated excel file and load new excel file data and replace old excel file data.

          They had files where ‘it was easier to read’ when they pivot the data. This was troublesome since some values are intermediate results. Dropping one column may imply dropping another one as well.

          All workflows required manual alignments along the way.

          They were only able to process 10% of the data from a year within a year. Managing millions in cash.

          Their data input came from different internal sources. Programs which were written two decades ago once and without any tests. Talking like VB, macro’s from host servers and copy-pasta data from other internal programs.

          And don’t get me started on customer tables… They created a zip-code encoded filesystem hierarchy where each customer data (you guessed it, excel file) was renamed and then saved. In each of these directories where randomly named files if something went wrong; So no actual file patterns to rely on.

          I respect them.

          They creates a diagram for their tables with word. Word! (Didn’t know either: you can select the web view in the bottom right corner and you get an infitive canvas…) Madness.

        • Trarmp@feddit.nl
          link
          fedilink
          arrow-up
          1
          ·
          20 hours ago

          I had a potential client, an accountant. They had their own, uh, system within a spreadsheet. They wanted me to program another system to be able to send their spreadsheet output into our governments IRS. Did a little back-and-forth but could not convince them to drop the idea.

    • veroxii@aussie.zone
      link
      fedilink
      arrow-up
      7
      arrow-down
      1
      ·
      2 days ago

      Strangely enough we actually solved this problem with AI a few months back. We upload the excel file to Gemini and have a prompt to extract the data we need in a specific json format. And it works surprisingly well.

      • conditional_soup@lemm.ee
        link
        fedilink
        arrow-up
        18
        ·
        2 days ago

        How well? Bet your life on it well, or “fewer hallucinations than we would have guessed” well? I’ve considered and toyed around with openAI models for logging supply room check offs in a JSON format and it went better than I hoped but worse than I needed.

        • veroxii@aussie.zone
          link
          fedilink
          arrow-up
          11
          ·
          edit-2
          2 days ago

          Really well. Temp turned down all the way, and Gemini has this new feature to run and execute code… Not function calling… It can write a small python script, run it and return the output.

          So our prompt explains the excel spreadsheet, then tell it exactly the format we need it in, and then tell it to use python and pandas to read in the CSV, clean it up and reshape it the way we need it to match what we expect and voila.

          So hallucinations are not really and issue with the data as it’s simply writing code which then deterministically processes and returns the data.

          Edit to add more info: basically Gemini can create and run a lambda function on the fly. And if you’re a coder you can really guide the prompt. Eg "load this into pandas. Then remove all the empty columns. Also remove the total rows. Now unpivot the data so the months are not columns but in separate rows with a column called month.

          You get the idea.

      • Echo Dot@feddit.uk
        link
        fedilink
        arrow-up
        4
        ·
        2 days ago

        It would still have to be in at least somewhat of a consistent format. Even a human would require that.

        If they’re just going to write the details however they feel on any particular day and then just expect someone or something to be able to interpret that they’re going to have a bad time.