What is a Data Source? Definitions and Examples (2024)

Related articles
  • What is MySQL? Everything You Need to Know
  • What is Middleware? Technology’s Go-to Middleman
  • What is Shadow IT? Definition, Risks, and Examples
  • What is Serverless Architecture?
  • What is SAP?

A data source is the location where data that is being used originates from.

A data source may be the initial location where data is born or where physical information is first digitized, however even the most refined data may serve as a source, as long as another process accesses and utilizes it. Concretely, a data source may be a database, a flat file, live measurements from physical devices, scraped web data, or any of the myriad static and streaming data services which abound across the internet.

Here’s an example of a data source in action. Imagine a fashion brand selling products online. To display whether an item is out of stock, the website gets information from an inventory database. In this case, the inventory tables are a data source, accessed by the web application which serves the website to customers.

Focusing on how the term is used in the familiar database management context will help to clarify what kinds of data sources exist, how they work, and when they are useful.

Data source nomenclature

Databases remain the most common data sources, as the primary stores for data in ubiquitous relational database management systems (RDBMS). In this context, an important concept is the Data Source Name (DSN). The DSN is defined within destination databases or applications as a pointer to the actual data, whether it exists locally or is found on a remote server (and whether in a single physical location or virtualized.) The DSN is not necessarily the same as the relevant database name or file name, rather it is in an address or label used to easily reach the data at its source.

Ultimately, the systems doing the ingesting (of data) determine the context for any discussion around data sources, so definitions and nomenclature vary widely and may be confusing. This is especially true in more technical documentation. For example, within the Java software platform, a ‘Datasource’ refers specifically to an object representing a connection to a database (like an extensible, programmatically packaged DSN). Meanwhile, some newer platforms use ‘DataSource’ more widely to mean any collection of data which provides a standardized means for access.

Data source types

Though the diversity of content, format, and location for data is only increasing with contributions from technologies such as IoT and the adoption of big data methodologies, it remains possible to classify most data sources into two broad categories: machine data sources and file date sources.

Though both share the same basic purpose — pointing to the data’s location and describing similar connection characteristics —machine and file data sources are stored, accessed, and used in different ways.

Machine data sources

Machine data sources have names defined by users, must reside on the machine that is ingesting data, and cannot be easily shared. Like other data sources, machine data sources provide all the information necessary to connect to data, such as relevant software drivers and a driver manager, but users need only ever refer to the DSN as shorthand to invoke the connection or query the data.

The connection information is stored in environment variables, database configuration options, or a location internal to the machine or application being used. An Oracle data source, for example, will contain a server location for accessing the remote DBMS, information about which drivers to use, the driver engine, and any other relevant parts of a typical connection string, such as system and user IDs and authentication.

File data sources

File data sources contain all of the connection information inside a single, shareable, computer file (typically with a .dsn extension). Users do not decide which name is assigned to file data sources, as these sources are not registered to individual applications, systems, or users, and in fact do not have a DSN like that of machine data sources. Each file stores a connection string for a single data source.

File data sources, unlike machine sources, are editable and copyable like any other computer file. This allows users and systems to share a common connection (by moving the data source between individual machines or servers), and for the streamlining of data connection processes (for example by keeping a source file on a shared resource so it may be used simultaneously by multiple applications and users).

It is important to note that ‘unshareable’ .dsn files also exist. These are the same type of file as described above, but they exist on a single machine and cannot be moved or copied. These files point directly to machine data sources. This means that unshareable file data sources are wrappers for machine data sources, serving as a proxy for applications which expect only files but also need to connect to machine data.

How data sources work

Data sources are used in a variety of ways. Data can be transported thanks to diverse network protocols, such as the well-known File Transfer Protocol (FTP) and HyperText Transfer Protocol (HTTP), or any of the myriad Application Programming Interfaces (APIs) provided by websites, networked applications, and other services.

Many platforms use data sources with FTP addresses to specify the location of data needed to be imported. For example, in the Adobe Analytics platform, a file data source is uploaded to a server using an FTP client, then a service utilizes this source to move and process the relevant data automatically.

SFTP (The S stands for Secure or SSH) is used when usernames and passwords need to be obfuscated and content encrypted, or FTPS may alternatively be used by adding Transport Layer Security (TLS) to FTP, achieving the same goal.

Meanwhile, many and diverse APIs are now provided to manage data sources and how they are used in applications. APIs are used to programmatically link applications to data sources, and typically provide more customization and a more versatile collection of access methods. For example, Spark provides an API with abstract implementations for representing and connecting to data sources, from barebones but extensible classes for generic relational sources, to detailed implementations for hard-coded JDBC connections.

Other protocols for moving data from sources to destinations, especially on the web, include NFS, SMB, SOAP, REST, and WebDAV. These protocols are often used within APIs (and some APIs themselves make use of other APIs internally), within fully featured data applications, or as standalone transfer processes. Each have characteristic features and security concerns which should be considered for any data transfer.

The purpose of a data source

Ultimately, data sources are intended to help users and applications connect to and move data to where it needs to be. They gather relevant technical information in one place and hide it so data consumers can focus on processing and identify how to best utilize their data.

The purpose here is to package connection information in a more easily understood and user-friendly format. This makes data sources critical for more easily integrating disparate systems, as they save shareholders from the need to deal with and troubleshoot complex but low-level connection information.

And although this connection information is hidden, it is always accessible when necessary. Additionally, this information is stored in consistent locations and formats which can ease other processes such as migrations or planned system structural changes.

Getting started with data sources and integration

Once data has arrived at its final destination, preferably a centralized repository such as a cloud data warehouse, differences in formatting or structure based on the source should be smoothed out. The very first step towards this data integration goal, however, involves abstracting the initial data connections themselves — a complex task when accounting for the number of data sources accessible via the cloud.

Talend helps customers integrate data from thousands of internal and cloud-based sources, speeding up the journey from unmanageable, disparate systems, to a unified view of trusted enterprise data. Using a single suite of apps focused on data integrity and data integration, Talend Data Fabric improves and secures your data value-chain, from the very initial connection to a data source to effective analytics and business intelligence.

Try Talend Data Fabric today to seamlessly integrate to your data sources and gain insights from data you can trust.

What is a Data Source? Definitions and Examples (2024)

FAQs

What is a Data Source? Definitions and Examples? ›

A data source is the location where data that is being used originates from. A data source may be the initial location where data is born or where physical information is first digitized, however even the most refined data may serve as a source, as long as another process accesses and utilizes it.

What is a data source example? ›

Machine data source examples include network traffic logs, system and application logs, output from sensors, event data from IoT devices, database query results, etc. Users need to use the data source name as a shortcut to connect to the data.

What is data source short answers? ›

A data source is simply the source of the data. It can be a file, a particular database on a DBMS, or even a live data feed. The data might be located on the same computer as the program, or on another computer somewhere on a network.

What are the main sources of data explain with examples? ›

Examples of primary data sources include surveys and questionnaires, direct observations, experiments, interviews, and focus groups. As a researcher, you must be familiar with primary data sources, which are original data collected firsthand specifically for your research purposes.

What are some examples of the most commonly used modern data sources? ›

Common sources of Data in Modern Data Pipelines?
  • Relational databases. ...
  • NoSQL databases. ...
  • Data warehouses. ...
  • File systems: ...
  • APIs: ...
  • Messaging Queues: ...
  • Social media platforms: ...
  • IoT devices and sensors:
Mar 22, 2023

What is a data example? ›

Qualitative Data

It includes information like age, time, height, weight, etc. The data is collected by measuring it on required parameters. For example, data on the number of students playing different sports from your class is in quantitative form.

What are the three 3 different types of data source? ›

What are Data Source Types?
  • Databases.
  • Flat files.
  • Web services.
  • Other sources such as RSS feeds.

How to identify data sources? ›

Identifying relevant data sources involves defining project goals, understanding required information, and considering available databases, literature, and experts. Evaluate the reliability and credibility of potential sources.

What is the data type with an example? ›

For example, a programmer might create a new data type named "complex number" that would include real and imaginary parts, or a color data type represented by three bytes denoting the amounts each of red, green, and blue, and a string representing the color's name.

What is a data source also known as? ›

In computing, a data source name (DSN, sometimes known as a database source name, though "data sources" can comprise other repositories apart from databases) is a string that has an associated data structure used to describe a connection to a data source.

What is your definition of data? ›

Data is information such as facts and numbers used to analyze something or make decisions. Computer data is information in a form that can be processed by a computer.

What are examples of primary data sources? ›

Primary sources

biographies, autobiographies, manuscripts. interviews, speeches, oral histories. case law, legislation, regulations, constitutions. government documents, statistical data, research reports.

How do you choose sources of data? ›

What to Look for When Choosing a Data Source
  1. High-Quality. ...
  2. Continuously updated. ...
  3. Relevant information. ...
  4. Strong online presence. ...
  5. Conclusion.

How do you describe a data source? ›

A data source is the location where data that is being used originates from. A data source may be the initial location where data is born or where physical information is first digitized, however even the most refined data may serve as a source, as long as another process accesses and utilizes it.

What is a common data source? ›

Some common data sources include: Surveys or questionnaires. Social media platforms. Medical records. Sales or financial reports.

What are some examples of big data sources? ›

Big data comes from many sources, including transaction processing systems, customer databases, documents, emails, medical records, internet clickstream logs, mobile apps and social networks.

What is common data source? ›

Some common data sources include: Surveys or questionnaires. Social media platforms. Medical records.

What are five examples of primary sources of data? ›

Primary sources
  • diaries, correspondence, ships' logs.
  • original documents e.g. birth certificates, trial transcripts.
  • biographies, autobiographies, manuscripts.
  • interviews, speeches, oral histories.
  • case law, legislation, regulations, constitutions.
  • government documents, statistical data, research reports.

What is 1 example of a source for input data? ›

The source can be a database, a flat file, an XML file, or any other format that a system can read. The input is recorded as a collection of records that contain information used in the business process. That information can include customer details, accounting figures, sales, logistics, and more.

Top Articles
Latest Posts
Article information

Author: Pres. Lawanda Wiegand

Last Updated:

Views: 6375

Rating: 4 / 5 (51 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Pres. Lawanda Wiegand

Birthday: 1993-01-10

Address: Suite 391 6963 Ullrich Shore, Bellefort, WI 01350-7893

Phone: +6806610432415

Job: Dynamic Manufacturing Assistant

Hobby: amateur radio, Taekwondo, Wood carving, Parkour, Skateboarding, Running, Rafting

Introduction: My name is Pres. Lawanda Wiegand, I am a inquisitive, helpful, glamorous, cheerful, open, clever, innocent person who loves writing and wants to share my knowledge and understanding with you.