Attempting to create my own data structure

So, I've been wondering, is there a best way to store data? - Obviously that's a fairly debatable subject. However, I've been thinking about data structures, and what would be the best generic data structure, I mean what would be the best data structure for 'n' data type? Or 'n' amount of data too.

I can imagine something like an array being exceptional for storing small amounts of data, although I'm not 100% sure if that is the case, I am no expert, by any means. This is why I've decided to start my personal project of creating my own abstract data structure, I feel that this will both help develop my programming abilities, as well as my theoretical understanding.

I've got a fairly basic, but good enough understanding of complexity theory, like with data structures, there are some that are better for searching, some are better for the amount of space required to store the input data. Luckily for me, my lecturer went over the basics of the complexity of some basic and generic data structures, so I have a basic enough idea.

I have the idea of implementing a list like tree, but I can imagine that being terrible in terms of the amount of space required, it would be something like O(N^n). Please correct me if I'm wrong, but I'm I would've imagined that would be the case? - I want to try and develop something that's excellent for querying, as well as the amount of space required. If I can possibly manage it, I'd like to try and develop my own implemented data structure that's both O(N) for the amount of time taken to store and search the data.

I would also like to cheat a little bit by implementing a sorting algorithm that sorts the data as it's input into the data structure. If possible, I'd like to develop some form of data structure that's capable of storing the data, then somehow optimising the stored data itself, but I would also imagine there'd be the problem of searching the structure's data, whilst it's optimising itself.

This is nothing more than a project, my initial idea was a star like tree, you'd have a central node, where it would have (for arguments sake) 5 branches coming from the one central node, then from every child node of the central node, you could have another 5 branches. Would this work well, in theory, of course half of this question purely depends on how it's implemented, but let's say it's implemented perfectly, just for the sake of debate. How effective would it be for searching, sorting and would it require a lot of space? - Like I've said, my understanding of complexity theory is basic, but what I know is good enough for me to learn from others.

There's no such thing as "best", it depends on what you're trying to do. Simply saying you're storing n datapoints is not enough. What does the data look like? What are you doing with the data?

Just an example suppose you have data streaming in and you wish to compute a moving average on the most recent 60 data points, which are just doubles. Which data structure would you use in this case?

1 Like

Well that's an interesting part of my project, I'm trying to develop something that would work for just about anything, I haven't designed any form of data as of yet, but I'm trying to think of a theoretical way where I could store any form of data. I mean I know this isn't realistic, but I'm just trying to form a theoretical way where you can store any data, and the system should still perform as expected.

In respect as to what I would be doing with the data, is mostly just storing it, rarely changing it, so it's somewhat dynamic data, but this isn't a huge focus, the data will just be somewhat dynamic.

If I knew how much data I would be storing, then I'd probably implement my own form of array, but a key point of the data structure that I'm trying to develop, it should be able to handle any amount of data. Obviously performance drops when you increase the size of your input, but the input is of n size. And as it's somewhat dynamic, n will change, it should be able to handle deletion and editing fairly well, although the structure doesn't have to focus on the data being dynamic.

I know what I'm asking is hard to answer, I appreciate that, and considering what I'm trying to think of, it's not realistic, but essentially, it's the perfect data structure. I mean it will essentially have all of the advantages of the different types (tree, heaps, graphs, lists etc.) , and as little disadvantages as possible.

I mean I know this may be unrealistic, but I would like to test myself with this project, both with my theory and my practical. See just how efficient I can make a data structure, and see how complex I can make it, in addition to seeing how well it works, and how hard it is for others to implement, this is a summer project for me.

Trying to say 'best' is algo dependent. Some algos don't even need a full history of data, just like in the moving average example I gave you.

What you're asking is sort of like: How do I make the best car?

What might be the best for a race circuit wouldn't necessarily be the best for transporting kids in.

I know, like I with my second reply, I know what I'm asking is very hard to answer.

And that's a good comparison in terms of what I'm asking, I appreciate how it's almost impossible to answer because there's no specification in terms of what it'll be used for, nor is there any specification on the data itself.

Can you personally think of a way in which you could develop a data structure that utilises all of the bonuses of other data structures? - I mean in theory of course, I'm not asking you to go away and develop a whole new data structure that matches what I've asked. But I guess it's just as challenging, me asking you to think of the theory behind such a data structure.

Usually data structures are built for a purpose and that purpose is (very often) to excel to do a thing (searching, storing, ...). A jack of all trade data structure, or "perfect" most of the time is not needed, because you usually want to optimize one aspect of your structure and that's what matters.

No, because a lot of the features would be mutually exclusive.

I totally agree with you there, it's impossible to disagree with such a statement, I'm just trying to think of a way in which I can store n amount of data, very quickly, sort the data very quickly, search the data very quickly, meanwhile the data and the structure itself uses as little space as possible.

I'm trying to think outside the box that holds the box, I'm trying to achieve something unrealistic here.

You usually make a tradeoff between storing data and accessing data, so I don't think it's possibile to achieve both.

I agree, yet again, I know what I'm asking s the impossible. But for now, I'd like to focus more around the accessing area than the sorting area, as I may think of a way where I could sort the data when the system is not in use.

Ok then. So as you already know memory access is by far the slowest operation you could do, even if the data structure is loaded in RAM it's still a lot slower that let's say an arithmetic operation, let alone accessing data from mass storage. So a clever thing you could do is to chose a really compact representation and store it in an array in the stack. Depending on the amount of data and type of data that needs to be stored, it is possibile to store you structure inside the CPU cache, making accessing waaay faster.

One Structure to rule them all, One Structure to find them,
One Structure to bring them all, and in the darkness bind them,