Restaurants, Queries, and Statistical Learning

8 years ago

This project is part of a large "Machine Problem" as part of UBC CPEN 221. Paired with another student, this programming assignment would act as the capstone to our endeavors and require everything we learned about Java programming and software development to:

Design and manage complex abstract data types
Implement client-server query support
Implement multi-threading with proper concurrency
Read and write structured data (JSON)

You can visit the Github project here.

K-Means Clustering Visualization

This project had five major components:

Designing an in-memory database
Perform some statistics on our data
Build a multi-threaded server
Use the server to perform some basic preset operations
Query the database with a custom grammar

Based upon the Yelp Academic Dataset, we first began by discussing a way to build a database that suited our needs while being flexible in the future. Our first concrete idea was to build a recursive datatype called a table, mirroring somewhat to that of MySQL, but would support another table as a value. One of the issues we encountered with this approach was that it was ambiguous what object type would be pulled if we called get() on a certain key. Avoiding casting errors and giving up some of the generality of our database design, we opted to use a single database interface that would contain Collections of specific classes. In our case, we implemented a YelpDb class that internally manages a list of users, restaurants, and reviews. Using more object-oriented practices, there are abstract business, user, and review interfaces, which enable us to implement YelpUser, YelpRestaurant, and YelpReview classes, each with specific function and capabilities.

Using this database design, it enabled us to use Jackson to easily parse incoming JSON into their respective classes, having to implement custom serializers and deserializers because Jackson by default calls method with the same key as the JSON keys. Example: getuser_id() rather than the more appealing getUserId(). Jackson was immense help because it automatically converts JSON strings to Objects and the reverse is also possible, saving us much time building a custom JSON parser.

Thinking about servers and concurrency was probably one of the more interesting things about this project. The basic networking code was graciously provided by our professor, but the nitty and gritty portion was pretty extensive. Having to support preset commands, retrieving the proper data, ensuring data insertion from the command line would be parsed properly and handling edge cases/exceptions was a hassle. However, it did provide much insight as to how command-line applications functions for their input statements.

The last, and arguably the most difficult portion of the project, was using ANTLR to interpret custom queries. With a sample input query like in(Telegraph Ave) && (category(Chinese) || category(Italian)) && price <= 2, my programming partner spent much time with regex and graph theory to construct a list of requirements that matched the query. I then interpreted these requirements and pulled all the restaurants that matched them all.

My partner and I learned a lot from this project alone. Thinking about design, implementing our ideas, and weeding out any bugs was pretty much a core development experience. I found this machine problem to be particularly helpful in my co-op position, where I actively work with SQL and (fundamentally) database input/output functionality.

Add a comment