Jump to content
Sign in to follow this  

Scala Big Data Examples

Recommended Posts



Thought I'd kick off the java resource with some pretty useful concepts using Spark shell to launch Scala scripts.


Scala is basically java but is a support for functional programming. If you aren't aware of what that is, well to give you an idea, its like Kotlin.

These examples are fairly simple, and are test material: Here are the questions:

Big Data exam#2.docx


These are my solutions for the exam: (note: i revived a 100 as a grade.)




Here are what the outputs look like:


make: mercedes-benz uses diesel and has maximum price of 31600
MALES IN US WITH 50k+: 6099
Females IN US WITH 50k+: 1072
MALES IN Canada WITH 50k+: 30
Females IN Canada WITH 50k+: 30


The purpose of this was to basically utilize scalas map() function. To understand this you must first understand what hadoops MapReducer does. flatMap() on scala is exactly the same as Javas MapReducer

and scalas map() is basically an improved way to utilizes the way map reducer uses its parallel processing to compute an output based on the given data set. Its really efficient for large data sets. For example Amazons database for customer orders could be one use case for scala and map().


From what my friends told me, they thought it was hard because all of them could not get complete solutions. Along with that, they told me most the other classmates where there for a long time. I'm sure some of them got it to work. I did not think it was to hard. I had it done in 50 minutes. I would say if you could understand this and grasp the concepts then you would be able to do some nifty stuff using functional programming to handle your big data sets 🙂


  • Like 1

Share this post

Link to post
Share on other sites
15 hours ago, Kai said:

Not even sure what this is? but yeah good thread.

Its a scala script which is built for java like kotlin sorta (functional programming)

It will take the input in the link provided in download #1 and spit out an output. It's useful for big data problems. For example 70000 inputs are being processed parallel (using scalas map() function). 

The concepts could be useful to know when dealing with really large databases. 

  • Like 1

Share this post

Link to post
Share on other sites
16 hours ago, Kai said:

Not even sure what this is? but yeah good thread.

Its an example for using scala to handle large data sets, i.e, large pieces of information like ones that would be stored as player files in a database for example or even like something as simple as 100k customers that have used someones service etc but when handling large datasets there are major do's and dont's in any language, general practices that you should follow to maximize efficiency 

This is just a working example in scala 


@Guruu nice example man hoping to see more quality content like this from you! A+


  • Like 1

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  


Important Information

By using our forums or services as a guest you agree to our Terms of Service: Terms of Use Privacy Policy: Privacy Policy and Guidelines